Systematic bias in high-throughput sequencing data and its correction by BEADS
نویسندگان
چکیده
Genomic sequences obtained through high-throughput sequencing are not uniformly distributed across the genome. For example, sequencing data of total genomic DNA show significant, yet unexpected enrichments on promoters and exons. This systematic bias is a particular problem for techniques such as chromatin immunoprecipitation, where the signal for a target factor is plotted across genomic features. We have focused on data obtained from Illumina's Genome Analyser platform, where at least three factors contribute to sequence bias: GC content, mappability of sequencing reads, and regional biases that might be generated by local structure. We show that relying on input control as a normalizer is not generally appropriate due to sample to sample variation in bias. To correct sequence bias, we present BEADS (bias elimination algorithm for deep sequencing), a simple three-step normalization scheme that successfully unmasks real binding patterns in ChIP-seq data. We suggest that this procedure be done routinely prior to data interpretation and downstream analyses.
منابع مشابه
مروری برتکنیک های توالی یابی DNA (نسل اول، نسل دوم و نسل سوم)
Introduction: The DNA sequencing is the most important technique in molecular biology by which the order of the nucleotides can be identified in a piece of DNA. There are several different methods for sequencing the DNA. Now, the DNA sequencing has great importance in the medical diagnostics and other medical fields. Some methods have been invented to speed up and increase the efficiency of the...
متن کاملDetecting and overcoming systematic bias in high-throughput screening technologies: a comprehensive review of practical issues and methodological solutions
Significant efforts have been made recently to improve data throughput and data quality in screening technologies related to drug design. The modern pharmaceutical industry relies heavily on high-throughput screening (HTS) and high-content screening (HCS) technologies, which include small molecule, complementary DNA (cDNA) and RNA interference (RNAi) types of screening. Data generated by these ...
متن کاملEstimation and correction for GC-content bias in high throughput sequencing
GC-content bias describes the dependence between fragment count (read coverage) and GC content found in high-throughput sequencing assays, particularly the Illumina Genome Analyzer technology. This bias can dominate the signal of interest for analyses that focus on measuring fragment abundance within a genome, such as copy number estimation. The bias is not consistent between samples, and curre...
متن کاملHiTEC: accurate error correction in high-throughput sequencing data
MOTIVATION High-throughput sequencing technologies produce very large amounts of data and sequencing errors constitute one of the major problems in analyzing such data. Current algorithms for correcting these errors are not very accurate and do not automatically adapt to the given data. RESULTS We present HiTEC, an algorithm that provides a highly accurate, robust and fully automated method t...
متن کاملTwo effective methods for correcting experimental high-throughput screening data
MOTIVATION Rapid advances in biomedical sciences and genetics have increased the pressure on drug development companies to promptly translate new knowledge into treatments for disease. Impelled by the demand and facilitated by technological progress, the number of compounds evaluated during the initial high-throughput screening (HTS) step of drug discovery process has steadily increased. As a h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 39 شماره
صفحات -
تاریخ انتشار 2011